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Abstract 


We derive an asymptotic lower bound on the Bayes risk when N 
identical quantum systems whose state depends on a vector of un¬ 
known parameters are jointly measured in an arbitrary way and the 
parameters of interest estimated on the basis of the resulting data. 
The b ound is an inte grated version of a quantum Cramer-Rao bound 
due to Holevol ( 19821 ). and it thereby links the fixed N exact Bayesian 
optimality usually pursued in the physics literature with the pointwise 
asymptotic optimality favoured in classical mathematical statistics. 
By heuristic arguments the bound can be expected to be sharp. This 
does turn out to be the case in various important examples, where it 
can be used to prove asymptotic optimality of interesting and useful 
measurement-and-estimation schemes. On the way we obtain a new 
family of “dual Holevo bounds” of independent interest. 


*This paper has appeared as R.D. Gill (2008), Conciliation of Bayes and pointwise 
quantum state estimation pp. 239-261 in Quantum Stochastics and Information: Statis¬ 
tics, Filtering and Control, V.P. Belavkin and M. Guta, eds., World Scientific. It was orig¬ 
inally submitted to Annals of Statistics under the title “Asymptotic information bounds in 
quantum statistics”. It was accepted subject to minor corrections but with the suggestion 
to extend it with explanatory and background material. Unfortunately I let the dead-line 
pass, and now I would want to fully work out the connection with Q-LAN developed by 
Mada Guta and Jonas Kahn in recent papers and in Jonas Kahn’s Leiden PhD thesis. 

tURL: www.math.leidenuiiiv.nl/~gill. Also affiliated with CWI, Amsterdam, the 
Netherlands, www. cwi . nl. 
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1 Introduction 


The aim of this paper is to derive asymptotic information bounds for “quan¬ 
tum i.i.d. models” in quantum statistics. That is to say, one has N copies of 
a quantum system each in the same state depending on an unknown vector 
of parameters 6, and one wishes to estimate 6, or more generally a vector 
function of the parameters iplO), by making some measurement on the N 
systems together. This yields data whose distribution depends on 9 and on 
the choice of the measurement. Given the measurement, we therefore have a 
classical parametric statistical model, though not necessarily an i.i.d. model, 
since we are allowed to bring the N systems together before measuring the 
resulting joint system as one quantum object. In that case the resulting data 
need not consist of (a function of) N i.i.d. observations, and a key quantum 
feature is that we can generally extract more information about 6 using such 
“collective” or “joint” measurements than when we measure the systems sep¬ 
arately. What is the best we can do as iV —cx), when we are allowed to 
optimize both over the measurement and over the ensuing data-processing? 

A heuristic, statistically motivated, approach to deriving methods with 
good properties for large N is to choose the measurement to optimize the 
Fisher information in the data, leaving it to the statistician to process the 
data efficiently, using for instance maximum likelihood or related methods, 
including Bayesian. This heuristic principle has already been shown to work 
in a number of special cases in quantum statistics. Since the measurement 
maximizing the Fisher information typically depends on the unknown pa¬ 
rameter value this often has to be implemented in a two-step approach, first 
using a small fraction of the N systems to get a first approximation to the 
true parameter, and then optimizing on the remaining systems using this 
rough guess. 

The approach favoured by many physicists is to choose a prior distribution 
and loss function on grounds of symmetry and physical interpretation, and 
then to exactly optimize the Bayes risk over all measurements and estimators, 
for any given TV. This approach succeeds in producing attractive methods on 
those rare occasions when a felicitous combination of all the mathematical 
ingredients leads to a simple and analytically tractable solution. Now it has 
been observed in a number of problems that the two approaches result in 
asymptotically equivalent estimators, though the measurement schemes can 
be strikingly different. Heuristically, this can be understood to follow from 
the fact that, in the physicists’ approach, for large N the prior distribution 
should become increasingly irrelevant and the Bayes optimal estimator close 
to the maximum likelihood estimator. Moreover, we expect those estimators 
to be asymptotically normal with variances corresponding to inverse Fisher 
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information. 

Here we link the two approaches by deriving a sharp asymptotic lower 
bound on the Bayes risk of the physicists’ approach, in terms of the optimal 
Fisher information of the statisticians’ approach. This enables us to conclude 
the asymptotic optimality of some heuristically motivated measurement- 
and-estimation schemes by showing that they attain the asymptotic bound. 
Sometimes one can hnd in this way asymptotically optimal solutions which 
are much easier to implement than the exactly optimal solution of the physi¬ 
cists’ approach. On the other hand, it also shows (if only heuristically) 
that the physicists’ approach, when successful, leads to procedures which are 
asymptotically optimal for other prior distributions than those used in the 
computation, also for loss functions only locally equivalent to their loss func¬ 
tion of choice, and also asymptotically optimal in a pointwise rather than a 
Bayesian sense. 

We derive qur ma in result by combining an existing quantum Cramer-Rao 
bound (IHolevol . Il982l ) with the van Trees inequa l ity, a Bayesian Cram er-Rao 
bound from classical statistics ( van Tre^ . 1968 ; Gill and Levit . 1995 1. The 
former can be interpreted as a bound on the Fisher information in an arbi¬ 
trary measurement on a quantum system, the latter is a bound on the Bayes 
risk (for a quadratic loss function) in terms of the Fisher information in the 
data. This means that our result and its proof can be understood without 
any familiarity with quantum statistics. Of course, to appreciate the appli¬ 
cations of the result, some further appreciation of “what is a quantum statis¬ 
tical model” is needed. The paper contains a brief s ummary of this; for more 


i nform ation the r eader is referred to the papers of iBarndorff-Nielsen et ah 


(120081 ). and iGilll (120011 ). For an overv i ew of the “state of the art” in quan¬ 
tum asymptotic statistics see iHavashil (120051) which reprints papers of many 
authors together with introductions by the editor. 

Let us develop enough notation to state the main result of the paper and 
compare it with the comparable result from classical statistics. Starting on 
familar ground with the latter, suppose we want to estimate a function 'ijj{6) of 
a parameter 6, both represented by real column vectors of possibly different 
dimension, based on N i.i.d. observations from a distribution with Fisher 
information matrix I{6). Let vr be a prior density on the parameter space 
and let G{6) be a symmetric positive-dehnite matrix dehning a quadratic 
loss function l{ip^^\9) = — ip{9)Y— '4’{9)). (Later we will 

use G{9), without the tilde, in the special case when ip is 9 itself). Dehne the 
mean square error matrix V^^\9) = the 

risk can be written R^^'>{9) = traceG{9)V^^\9). The Bayes risk is = 

Ejrtrace . Here, Eg denotes expectation over the data for given 9, E^r 
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denotes averaging over 6 with respect to the prior tt. The estimator is 
completely arbitrary. We assnme the prior density to be smooth, compactly 
snpported and zero on the smooth bonndary of its snpport. Fnrthermore 
a certain qnantity ronghly interpreted as “information in t he prior” mnst 
be finite. Then it is very easy to show (IGill and Levitl . Il995l ). nsing the van 
Trees ineqnality, that under minimal smoothness conditions on the statistical 
model, 

liminf > E^rtraceG/”^ (1) 

N^OO 

where G = tp'Gip'~^ and ip' is the matrix of partial derivatives of elements of 
pj with respect to those of 6. 

Now in quantum statistics the data depends on the choice of measure¬ 
ment and the measurement should be tuned to the loss function. Given 
a measurement on N copies of the quantum system, denote by 

the average Fis her info r matio n (i.e., Fisher information divided by N) in 
th e data. The iHolevol fll982h qu antum Cramer-Rao bound, as extended 
bv iHayashi and Mat.snmotol (120941 ) to the quantum i.i.d. model, can be ex¬ 
pressed as saying that, for all 9, G, N and 


traceG(0)(/£'^(0))-' > Qg{0) 


( 2 ) 


for a certain quantity Cg( 6'), which depends on the specihcation of the quan¬ 
tum statistical model (state of one copy, derivatives of the state with respect 
to parameters, and loss function G) at the point 9 only, i.e., on local or point- 
wise model features (see (JTl) below). According to as yet unpublished work 
of M. Hayashi the bound is asymptotically sharp. The idea behind his work 
is that locally, the quantum i.i.d. model is well approximated by a quantum 
Gaussian locati on mode l , a qu antum statistical problem for which the Holevo 
bound is sharp (IHolevol . Il982l) . 

We aim to prove that under minimal smoothness conditions on the quan¬ 
tum statistical model, and conditions on the prior similar to those needed in 
the classical case, but under essentially no conditions on the estimator-and- 
measurement sequence. 


liminf ArRW(vr) > E^Cg (3) 

N^oo 

where, as before, G = pj'Gp)'^. The main result (ED is exactly the bound one 
would hope for, from heuristic statistical principles, and one may also expect 
it to be sharp, for the reasons mentioned above. In specific models of interest, 
the right hand side is often easy to calculate. Various specific measurement- 
and-estimator sequences, motivated by a variety of approaches, can also be 
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shown in interesting examples to achieve the bonnd. The restrictions on the 
prior can often be relaxed by approximating the prior of interest, as we will 
show in onr examples. 

It was also shown in iGill and LevitI (119951 ). how—in the classical statisti¬ 
cal context—one can replace a fixed prior vr by a seqnence of priors indexed 
by N, concentrating more and more on a fixed parameter valne 6q, at rate 
1/y/N. Following their approach wonld, in the qnantnm context, lead to the 
pointwise asymptotic lower bonnds 


Urn ini NR^^\e) > GciO) 


N- 


(4) 


for each 6, for regular estimators, and to local asymptotic minimax bonnds 
lim liminf snp NR^^\6) > Gg{0o) (5) 

M^oo N^oo ||0_5I(j||<7v-i/2a^ 


for all estimators, bnt we do not fnrther develop that theory here. In classical 
statistics the theory of Local Asymptotic Normality is the way to nnify, 
generalise, and nnderstand this kind of resnlt. We do not yet have a theory 
of “Q-LAN” though there are indications that it may be possible to build 
such a theory. The results we obtain here using more elementary tools do 
give further support to the distant aim of building a Q-LAN theory. 

The basic tools used in this paper have now all been mentioned, but as 
we shall see, the proof is not a routine application of the van Trees inequality. 
The missing ingredient will be provided by the following new dual bound to 
P; for all 9, K, N and M^, 

traceiL(0)7Si'V) < (6) 


where G^{9) actually equals Gcid) for a certain G dehned in terms of K (as 
explained in Theorem [2] below). This is an upper bound on Fisher informa¬ 
tion, in contrast to ([2]) which is a lower bound on inverse Fisher information. 
The new inequality ([6]) follows from the convexity of the sets of information 
matrices and of inverse information matrices for arbitrary measurements on 
a quantum system, and these convexity properties have a simple statistical 
explanation. Such dual bounds have cropped up incidentally in quantum 
statistics, for instance in iGill and Massarl (120091 ). but this is the first time a 
connection is established. 

The argument for dH]), and given that, for ([H]), is based on some general 
structural features of quantum statistics, and hence it is not necessary to 
be familiar with the technical details of the set-up. In the next section we 
will summarize the i.i.d. model in quantum statistics, focussing on the key 
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facts which will be used in the proof of the dual Holevo bound dU]) and of 
our main result, the asymptotic lower bound dS]). These proofs are given in 
a subsequent section, where no further “quantum” arguments will be used. 
In a final section we will give three applications, leading to new results on 
some much studied quantum statistical estimation problems. 


2 Quantum statistics: the i.i.d. parametric 
case. 

The basic objects in quantum statistics are states and measurements, dehned 
in terms of certain operators on a complex Hilbert space. To avoid technical 
complications we restrict attention to the finite-dimensional case, already rich 
in structure and applications, when operators are represented by ordinary 
(complex) matrices. 


States and measurement The state of a d-dimensional system is repre¬ 
sented hj a. d X d matrix p, called the density matrix of the state, having 
the following properties: p* = p (self-adjoint or Hermitian), p > 0 (non¬ 
negative), trace(p) = 1 (normalized). “Non-negative” actually implies “self- 
adjoint” but it does no harm to emphasize both properties. 0 denotes the 
zero matrix; 1 will denote the identity matrix. 

Example-, when d = 2, every density matrix can be written in the form 
p = |(1 + 6*1 cTi -F 92(J2 + ^'sc^s) where 


0-1 




0 

-1 


are the three Pauli matrices and where Of + 62 + 6^ < 1. □ 

“Quantum statistics” concerns the situation when the state of the system 
p{6) depends on a (column) vector 6 of p unknown (real) parameters. 

Example: a completely unknown two-dimensional quantum state depends 
on a vector of three real parameters, 9 = {9i,92,9^Y, known to he in the 
unit ball. Various interesting submodels can be described geometrically: e.g., 
the equatorial plane; the surface of the ball; a straight line through the ori¬ 
gin. More generally, a completely unknown d-dimensional state depends on 
p = — 1 real parameters. □ 

Example: in the previous example the two-parameter case obtained by de¬ 
manding that 9f + 92 + 9^ = 1 is called the case of a two-dimensional pure 
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state. In general, a state is called pure if = p or equivalently p has rank 
one. A completely unknown pure d-dimensional state depends onp = 2(d—1) 
real parameters. □ 

A measurement on a quantum system is characterized by the outcome 
space, which is just a measurable space (X, B), and a positive operator val¬ 
ued measure (POVM) M on this space. This means that for each i? G B 
there corresponds a. d x d non-negative self-adjoint matrix M{B), together 
having the usual properties of an ordinary (real) measure (sigma-additive), 
with moreover M(X) = 1. The probability distribution of the outcome of 
doing measurement M on state p{d) is given by the Born law, or trace rule: 
Pr(outcome E B) = trace(p(6')M(i?)). It can be seen that this is indeed a 
bona-fide probability distribution on the sample space (X, B). Moreover it 
has a density with respect to the finite real measure trace(M(i?)). 

Example-, the most simple measurement is defined by choosing an orthonor¬ 
mal basis of C'^, say V'lr • • taking the outcome space to be the discrete 
space X = {1,... , d}, and defining M{{x}) = for a: G X; or in physi¬ 

cists’ notation, M{{x}) = \'ipx){'4^x\- One computes that Pr(outcome = x) = 
'il^lp^O'j'ipx = i'lpxlpl'if^x)■ If the state is pure then p = 00* = |0)(0| for some 
0 = 0(d) G C'’* of length 1 and depending on the parameter 9. One hnds that 
Pr(outcome = x) = |0*0P = \{'tpx\4>)\^- D 

So far we have discussed state and measurement for a single quantum sys¬ 
tem. This encompasses also the case of N copies of the system, via a tensor 
product construction, which we will now summarize. The joint state of N 
identical copies of a single system having state p{6) is p{9)^^, a density ma¬ 
trix on a space of dimension d^. A joint or collective measurement on these 
systems is specified by a POVM on this large tensor product Hilbert space. 
An important point is that joint measurements give many more possibilities 
than measuring the separate systems independently, or even measuring the 
separate systems adaptively. 

Fact to remember 1. State plus measurement determines probability dis¬ 
tribution of data. 


Quari tum Cramer-Rao bound. Our main input is going to be the lHolevo 
1 198^ quantum Cramer-Rag boun d, with its extension to the i.i.d. case due 


to 


Havashi and Matsumoto f 2004h . 


Precisely because of quantum phenomena, different measurements, in¬ 
compatible with one another, are appropriate when we are interested in 
different components of our parameter, or more generally, in different loss 
functions. The bound concerns estimation of 6 itself rather than a function 
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thereof, and depends on a quadratic loss function defined by a symmetric 
real non-negative matrix G{6) which may depend on the actual parameter 
value 9. For a given estimator 9^^'^ computed from the outcome of some 
measurement on N copies of our system, dehne its mean square error 

matrix V^^\9) = Eg{9^^^ — 9){9^^^ — 9)~^. The risk function when using the 
quadratic loss determined by G is R^^^9) = Eg{9^^'^ — 9yG{9)(9^^^ ~ ^) = 
trace(G( 0 )l/W( 0 )). 

One may expect the risk of good measurements-and-estimators to de¬ 
crease like N~^ as N oo. The quantum Cramer-Rao bound confirms that 
this is the best rate to hope for: it states that for unbiased estimators of 
a p-dimensional parameter 9, based on arbitrary joint measurements on N 
copies, 

NR^^\9) > 60(9) = inf trace(G(0)R) (7) 

X,V:V>Z(X) 

where X = (Xi,... ,Xp), the Xj are d x d self-adjoint matrices satisfying 
d/d9itYace{p{9)Xj) = 5ij] Z is the p x p self-adjoint matrix with elements 
trace(p( 6 *)XjXj); and R is a real symmetric matrix. It is possible to solve 
the optimization over V for given X leading to the formula 

60(9) = inftrace(?[i(G^/^Z(X)G^/^)+ ahsQ(G^/^Z(X)G^/^)) ( 8 ) 

where G = G{9). The absolute value of a matrix is found by diagonalising it 
and taking absolute values of the eigenvalues. We’ll assume that the bound 
is hnite, i.e., there exists X satisfying the constraints. A sufficient condition 
for this is that the Helstrom quantum information matrix H introduced in 
(ITTll below is nonsingular. 

For specific interesting models, it often turns out not difficult to compute 
the bound 6 ^( 6 '). Note, it is a bound which depends only on the density 
matrix of one system (X = 1) and its derivative with the respect to the 
parameter, and on the loss function, both at the given point 9. It can be 
found by solving a finite-dimensional optimization problem. 

We will not be concerned with the specihc form of the bound. What we 
are going to need, are just two key properties. 

Firstly: the bound is local, and applies to the larger class of locally un¬ 
biased estimators. This means to say that at the given point 9, Eg9^^'^ = 9, 
and at this point also d/d9iEg9j = 6ij. Now, it is well known that the 
“estimator” 6*0 + I{9o)~^S{9o), where I{9) is Fisher information and S{9) is 
score function, is locally unbiased at 9 = 9o and achieves the Cramer-Rao 
bound there. Thus the Cramer-Rao bound for loeally unbiased estimators is 
sharp. Consequently, we can rewrite the bound ([7j) in the form ([2]) announced 


above, where is the average (divided by N) Fisher information in the 

ontcome of an arbitrary measnrement M = on N copies and the right 

hand side is defined in ((71) or (0). 


Fact to remember 2. We have a family of computable lower bounds on the 
inverse average Fisher information matrix for an arbitrary measurement on 
N copies, given by ^ and & or 

Secondly, for given 6 , define the following two sets of positive-definite sym¬ 
metric real matrices, in one-to-one correspondence with one another throngh 
the mapping “matrix inverse”. The matrices G occnrring in the dehnition 
are also taken to be positive-dehnite symmetric real. 


V={V : tmce{GV) > Cg V G}, 


(9) 


3 = {I : trace(G/ ) > Cg V G}. 


( 10 ) 


In the appendix to this paper, we give an algebraic proof that that the set J 
is convex (for V, convexity is obvious), and that the inequalities defining V 
define supporting hyperplanes to that convex set, i.e., all the inequalities are 
achievable in V, or equivalently Cg = infygv trace(GF). 

In fact, these properties have a statistical explanation, connected to the 
fact that the quantum statistical problem of collective measurements on N 
identic al quantum systems app roaches a quantum Gaussian problem as N 


Gufa and Kahn f 2006f) . It can be shown f Havash J 2003 : Hayashi, 


cx), see 

personal communication; Gufa, 2005, unpublished manuscript), that V con¬ 
sists of all covariance matrices of locally unbiased estimators achievable (by 
suitable choice of measurement) on a certain p-parameter guantum Gaussian 
statistical model. The inegualities defining V are the Holevo bounds for that 
model, and each of those bounds is attainable. Thus, for each G, there exists 
a G G V achieving equality in trace(GI/) > Cg. It follows from this that 
J consists of all non-singular information matrices together with any non¬ 
singular matrix smaller than some information matrix, achievable by choice 
of measurement on the same quantum Gaussian model. Gonsider the set 
of information matrices attainable by some measurement together with all 
smaller matrices; and consider the set of variance matrices of locally unbi¬ 
ased estimators based on arbitrary measurements. Note that adding zero 
mean noise to a locally unbiased estimator preserves its local unbiasedness, 
so adding larger matrices to this set does not change it. The set of infor¬ 
mation matrices is convex: choosing measurement 1 with probability p and 
measurement 2 with probability q (and remembering your choice) gives a 
measurement whose Fisher information is the convex combination of the in¬ 
formations of measurements 1 and 2. Augmenting the set with all matrices 
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smaller than something in the set, preserves convexity. (The set of vari¬ 
ances of locally nnbiased estimators is convex, by a similar randomization 
argnment). Pntting this together, we obtain 

Fact to remember 3. For given 6, both V and J defined in m and m 
are convex, and all the inequalities defining these sets are achieved by points 
in the sets. 

See the appendix for a direct algebraic proof. 


3 An asymptotic Bayesian information bound 

We will now introduce the van Trees inequality, a Bayesian Cramer-Rao 
bound, and combine it with the Holevo bound ([2]) via derivation of a dual 
bound following from the convexity of the sets ([7]) and (0). We return to the 
problem of estimating the (real, column) vector function ^>{6) of the (real, 
column) vector parameter 6* of a state p{6) based on collective measurements 
of N identical copies. The dimensions of and of 6 need not be the same. 
The sample size N is largely suppressed from the notation. Let V be the mean 
square error matrix of an arbitrary estimator-0, thus 1/(0) = Eg{'il)—'ijj{6)){fi — 
Often, but not necessarily, we’ll have fj = for some estimator 
of 6 . Suppose we have a quadratic loss function {fij — f){6)YG{6){fij — 'ip{9)) 
where G is a positive-dehnite matrix function of 9, then the Bayes risk with 
respect to a given prior n can be written R{7i) = E^rtrace GV. We are going 
to prove the following theorem: 

Theorem 1. Suppose p{9) ■. 9 & Q F W is a smooth quantum statistical 
model and suppose n is a smooth prior density on a compact subset ©o ^ 
such that ©0 has a piecewise smooth boundary, on which tt is zero. Suppose 
moreover the quantity ^(vr) defined in / flT]) below, is finite. Then 

MmmfNR^^^TT) > E^Cgo (H) 

W—>oo 

where Gq = (and assumed to be positive-definite), is the matrix 

of partial derivatives of elements of with respect to those of 9, and Qca is 

defined hy 0 or ®. 

“Once continuously differentiable” is enough smoothness. Smoothness of 
the quantum statistical model implies smoothness of the classical statistical 
model following from applying an arbitrary measurement to N copies of the 
quantum state. Slightly weaker but more elaborate smoothness conditions 
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on the statistical model and prior are spelled out in iGill and LevitI fjl995l) . 
The restriction that Gq be non-singular can probably be avoided by a more 
detailed analysis. 

Let Im denote the average Fisher information matrix for 9 based on a 
given collective measurement on the N copies. Then the van Trees inequality 
states that for all matrix functions C of 9, of size dim('^) x dim(6*), 


A^ETrtraceGl/ > 


(Ejrtrace 


'LT,ticLceG~^C I mC^ + 


Ar“^7r ^2 


( 12 ) 


where the primes in xjj' and in (Ctt)' both denote differentiation, but in the 
hrst case converting the vector 'ip into the matrix of partial derivatives of 
elements of pj with respect to elements of 9, of size dim('^) x dim(6*), in the 
second case converting the matrix Ctt into the column vector, of the same 
length as p), with row elements To get an optimal bound 

we need to choose G{9) cleverly. 

First though, note that the Fisher information appears in the denominator 
of the van Trees bound. This is a nuisance since we have a Holevo’s lower 
bound (El) to the inverse Fisher information. We would like to have an upper 
bound on the information itself, say of the form ([6]), together with a recipe 
for computing C^. 

All this can be obtained from the convexity of the sets 9 and V dehned in 
firUj) and dH]) and the non-redundancy of the inequalities appearing in their 
dehnitions. Suppose Vq is a boundary point of V. Dehne Jq = Thus 

Jo (though not necessarily an attainable average information matrix 
satishes the Holevo bound for each positive-dehnite G, and attains equality 
in one of them, say with G = Gq. In the language of convex sets, and “in 
the Id-picture”, trace Gold = Cgq is a supporting hyperplane to V at Id = Vq. 

Under the mapping “matrix-inverse” the hyperplane trace Gold = Ggq 
in the Id-picture maps to the smooth surface trace GqJ”^ = Ggq touching 
the set J at Iq in the /-picture. Since 9 is convex, the tangent plane to 
the smooth surface at J = Jo must be a supporting hyperplane to J at this 
point. The matrix derivative of the operation of matrix inversion can be 
written (iA~^/dx = —A~^{(iA/dx)A~^. This tells us that the equation of 
the tangent plane is trace GoJo"^JJo"^ = trace Gq/q"^ = Cgq. Since this is 
simultaneously a supporting hyperplane to J we deduce that for all J G J, 
trace Go Jq'^JJo'^ < Ggo- Dehning Kq = I^^GoIq^ and = Qgo we rewrite 
this inequality as trace J/qJ < 

A similar story can be told when we start in the /-picture with a support¬ 
ing hyperplane (at J = Jq) to J of the form trace KqI = for some symmet¬ 
ric positive-dehnite Kq. It maps to the smooth surface trace JJold“^ = C^°, 
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with tangent plane ii&ce KqVq^ IVq^ = Q^° at V = Vq = Iq^. By strict 
convexity of the fnnction “matrix inverse”, the tangent plane tonches the 
smooth snrface only at the point Vq. Moreover, the smooth snrface lies 
above the tangent plane, bnt below V. This makes Vq the nniqne minimizer 
of trace KoVq~^IVq~^ in V. 

It would be useful to extend these computations to allow singular J, G 
and K. Anyway, we summarize what we have so far in a theorem. 

Theorem 2. Dual to the Holevo family of lower bounds on average inverse 
information, traced^ > Cg for each positive-definite G, we have a family 
of upper bounds on information, 

trace KIm < for each K. (13) 

If !() & J satisfies trace Go-fo"^ = ^Go l^^on with Kq = Iq^GqIq^, = Cgq- 
Conversely if lo E 3 satisfies trace KqIq = then with Gq = IqKq h, Cgo — 
G^°. Moreover, none of the bounds is redundant, in the sense that for all 
positive-definite G andK, Gq = infyevtrace(GId) andG^ = supj£jtrace(A'/) 
The minimizer in the first equation is unique. 

Now we are ready to apply the van Trees inequality. First we make a guess 
for what the left hand side of (IT^ should look like, at its best. Suppose we 
use an estimator f) = f) {9) where 9 makes optimal use of the information in 
the measurement M. Denote now by Im the asymptotic normalized Fisher 
information of a sequence of measurements. Then we expect that the asymp¬ 
totic normalized covariance matrix D of is equal to and there¬ 
fore the asymptotic normalized Bayes risk should be E^rtraceG-^ = 

Entracefi'^Gfi'I]^^. This is bounded below by the integrated Holevo bound 
EttCgo with Go = ip'~^Gip'. Let Iq E 3 satisfy trace Gq/q"^ = ^Goj ifs existence 
and uniqueness are given by Theorem O (Heuristically we expect that Jq 
is asymptotically attainable). By the same Theorem, with Kq = Iq^GqIq^, 
GKo — _ trace Go/q"^ = trace G'^'Jg'h 

Though these calculations are informal, they lead us to try the matrix 
function G = GiP'Iq^. Dehne Vq = Iq^. With this choice, in the nu¬ 
merator of the van Trees inequality, we hnd the square of trace Gfi'^ = 
trace G^jJ'IQ^^jJ'^ = trace GqVo = Cgq- In the main term of the denominator, 
we hnd trace G~^ Gip' Iq^ I G = trace/ q"^Go/o"^-^m = trace KqI m < 

giCo = py the dual Holevo bound 0131) . This makes the numerator of the 
van Trees bound equal to the square of this part of the denominator, and 
using the inequality /{a b) > a — b we hnd 

NE^traceGV > E^Ggq - (14) 
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where 


(15) 


d{7r) = 


{CTry^G-\Cny 


TT^ 


with C = Gxfj'Vo and Vq uniquely achieving in V the bound trace > Cgoi 
where Gq = G^)'. Finally, provided ^(tt) is finite (which depends on the 
prior distribution and on properties of the model), we obtain the asymptotic 
lower bound ^ 

lim inf A^E^trace GV > E^Cgo- (16) 

N^oo 


4 Examples 

In the three examples discussed here, the loss function is derived from a very 
popular (among the physicists) figure-of-merit in state estimation called fi¬ 
delity. Suppose we wish to estimate a state p = p{6) by p = p{0). Fidelity 
measures the closeness of the two states, being maximally equal to 1 when the 

estimate and truth coincide. It is defined as Fid(p, p) = (trace(^ 

(some authors would call this squared fidelity). When both states are pure, 
thus p = 10)(01 and p = |0)(0| where 0 and 0 are unit vectors in C'^, then 
Fid(0, 0) = I (010) p. There is an important characterization of fidelity due to 
Fuchs f I 995 I ) which both explains its meaning and leads to many important 
properties. Suppose M is a measurement on the quantum system. Denote 
by M(p) the probability distribution of the outcome of the measurement M 
when applied to a state p. For two probability distributions P, P on the 
same sample space, let p and p be their densities with respect to a dominat¬ 
ing measure p and define the fidelity between these probability measures as 
Fid(P,P) = (/ p 2 p 2 dp) . In usual statistical language, this is the squared 
Hellinger affinity between the two probability measures. It turns out that 
Fid(p, p) = inf^ Fid(M(p), M(p)), thus two states have small fidelity when 
there is a measurement which distinguishes them well, in the sense that the 
Hellinger affinity between the outcome distributions is small, or in other 
words, the L 2 distance between the root densities of the data under the two 
models is large. 

Now suppose states are smoothly parametrized by a vector parameter 
6. Consider the fidelity between two states with close-by parameter values 
6 and 6, and suppose they are measured with the same measurement M. 
From the relation f p^p^dp = 1 — |p and by a Taylor expansion to 

second order one finds 1 — Fid(P, P) — 9ffilM{d){0 — 9) where Im{9) 

is the Fisher information in the outcome of the measurement M on the state 
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p{0). We will define the Helstrom quantum information matrix H{d) by the 
analogous relation 


1 - Fid(p,p)« -(9 - epH{e){e - e). 


(17) 


It turns out that H(6) is the smallest “information matrix” such that 
H{6) for all measurements M. 

Taking as loss function l{9,9) = 1 — Fid{p{9), p{9)) we would expect (by a 
quadratic approximation to the loss) that EttCi rr is a sharp asymptotic lower 

4 

bound on N times the Bayes risk. We will prove this result for a number of 
special cases, in which by a fortuitous circumstance, the hdelity-loss function 
is exactly quadratic in a (sometimes rather strange) function of the param¬ 
eter. The hrst two exam ples concern a two-d imensional quantum system 
and are treated in depth in Bagan et ah f 2006al) : below we just outline some 
important features of the application. In the second of those two examples 
our asymptotic lower bound is an essential part of a proof of asymptotic 
optimality of a certain measurement-and-estimation scheme. 

The third example concerns an unknown pure state of arbitrary dimen¬ 
sion. Here we ar e present a shor t and geometric proof of a surprising but little 
known result of iHavashil fjl998h which shows that an extraordinarily simple 
measurement scheme leads to an asymptotically optimal estimator (provid¬ 
ing the data is processed efficiently). The a nalysis also links the previously 
unco nnected Holevo and Gill-Massar bounds f Holevol . 19821: Gill and Massarl . 

2000fl . 


4.1 Completely unknown spin half {d= 2, p=3) 

Recall that a completely unknown 2-dimensional quantum state can be writ¬ 
ten p{9) = |(1 -|- 6*i(Ti -|- 92<J2 + ^sO’s), where 9 lies in the unit ball in 
It turns out that Fid(p,p) = |(1 + 9 ■ 9 + {I — |16'|p)5(l — ||6'|p)5). Dehne 
'ip{9) to be the four-dimensional vector obtained by adjoining (1 — ||0p)2 
to 9i, 92, 9^. Note that this vector has constant length 1. It follows that 
1 — Fid(p, p) = IIIV’ — t/’P- This is a quadratic loss-function for estimation 
of 'ip{9) with G = 1, the 4x4 identity matrix. By Taylor expansion of both 
sides, we hnd that = G and conclude from Theorem 1 that N 

times 1— mean hdelity is indeed asymptotically lower bounded by E.,rC 




In Bagan. Ballester. Gill. Monras and Mnhoz-Tapia f 2nn6al ) the exactly 
optimal measurement-and-estimation scheme is derived and analysed in the 
case of a rotationally invariant prior distribution over the unit ball. The 
optimal measurement turns out not to depend on the (arbitrary) radial part 
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of the prior distribution, and separates into two parts, one used for estimating 
the direction 6'/||6'||, the other part for estimating the length H^H. The Bayes 
optimal estimator of the length of 6 naturally depends on the prior. Because 
of these simplihcations it is feasible to compute the asymptotic value of N 
times the (optimal) Bayes mean hdelity, and this value is (3 + 2E,r||6*||)/4. 

The Helstrom quantum information matrix H and the Holevo lower bound 
are also computed. It turns out that Qi^{6) = (3+2||6'||)/4. Our asymp¬ 
totic lower bound is not only correct but also, as expected, sharp. 

The van Trees approach does put some non-trivial conditions on the prior 
density vr. The most restrictive conditions are that the density is zero at the 
boundary of its support and that the quantity flTSl) be hnite. Within the unit 
ball everything is smooth, but there are some singularities at the boundary 
of the ball. So our main theorem does not apply directly to many priors 
of interest. However there is an easy approximation argument to extend its 
scope, as follows. 

Suppose we start with a prior tt supported by the whole unit ball which 
does not satisfy the conditions. For any e > 0 construct T = which is 
smaller than (1 -|- e)7r everywhere, and 0 for H^H >1 — 5 for some 5 > 0. 
If the original prior tt is smooth enough we can arrange that T satishes the 
conditions of the van Trees inequality, and makes flTHl) hnite. N times the 
Bayes risk for T cannot exceed 1 -|-e times that for vr, and the same must also 
be true for their limits. Finally, E^f^Ci rr —> E.n-Ci rr as e —> 0. 

Some last remarks on this example: hrst of all, it is known that only 
collective measurements can asymptotically achieve this bound. Separate 
measurements on separate systems lead to strictly worse estimators. In fact, 
by the same methods one can obtain the sharp asymptotic lower bound 
9/4 (independent of the prior), see Bagan, Ballester, Gill, Muhoz-Tapia and 
Romero-Isart (2006b), when one allows the measurement on the nth system 
to depend on the data obtained fro m the earlier ones. Ins tead of the Holevo 
bound itself, we use here a bound of iGill and Massarl fj2000l ). which is actually 
has the form of a dual Holevo bound. (We give some more remarks on this 
at the end of the discussion of the third example). Secondly, our result gives 
strong heuristic sup port to the claim that the measurement-and-estirnation 
scheme developed in iBagan. Ballester. Gill. Monras and Muhoz-Tap ial fl2006al) 
for a specihc prior and specihc loss function is also pointwise optimal in a 
minimax sense, or among regular estimators, for loss functions which are lo¬ 
cally equivalent to hdelity-loss; and also asymptotically optimal in the Bayes 
sense for other priors and locally equivalent loss functions. In general, if the 
physicists’ approach is successful in the sense of generating a measurement- 
and-estimation scheme which can be analytically studied and experimentally 
implemented, then this scheme will have (for large N) good properties inde- 
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pendent of the prior and only dependent on local properties of the loss. 

4.2 Spin half: equatorial plane {d= 2, p=2) 

Bagan. Ballester. Gill. Monras and Mnhoz-Tapial (j2nn6al ) also considered the 
case where it is known that ^3 = 0, thus we now have a two-dimensional 
parameter. The prior is again taken to be rotationally symmetric. The 
exactly Bayes optimal measurement turns out (at least, for some N and 
for some priors) to depend on the radial part of the prior. Analysis of the 
exactly optimal measurement-and-estimation procedure is not feasible since 
we do not know if this phenomenon persists for all N. However there is a 
natural measurement, which is exactly optimal for some N and some priors, 
which one might conjecture to be asymptotically optimal for all priors. This 
sub-optimal measurement, combined with the Bayes optimal estimator given 
the measurement, can be analysed and it turns out that N times 1— mean 
hdelity converges to 1/2 as > 00 , independently of the prior. Again, the 
Helstrom quantum information matrix H and the Holevo lower bound 
are computed. It turns out that Girr{9) = 1/2. This time we can use our 
asymptotic lower bound to prove that the natural sub-optimal measurement- 
and-estimator is in fact asymptotically optimal for this problem. 

For a p-parameter model the best one could every hope for is that for 
large N there are measurements with Im approaching the Helstrom upper 
bound H. Using this bound in the van Trees inequality gives the asymptotic 
lower bound on N times 1— mean hdelity of p/4. The example here is a 
special case where this is attainable. Such a model is called quasi-classical. 

If one restricts attention to separate measurements on separate systems 
the sharp asymptotic lower bound is 1, twice as large, see Bagan, Ballester, 
Gill, Muhoz-Tapia and Romero-Isart (2006b). 

4.3 Completely unknown d dimensional pure state 

In this example we make use of the dual Holevo bound and symmetry argu¬ 
ments to show that in this example, the original Holevo bound for a natural 
choice of G (corresponding to hdelity-loss) is attained by an extremely large 
class of measurements, including one of the most basic measurements around, 
known as “standard tomography”. 

For a pure state p = |0)(</>|, hdelity can be written where |(/)) G C'^ 

is a vector of unit length. The state-vector can be multiplied by for an 
arbitrary real phase a without changing the density matrix. The constraint of 
unit length and the arbitrariness of the phase means that one can parametrize 
the density matrix p corresponding to |0) by 2((i— 1) real parameters which 
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we take to be our underlying vector parameter 6 (we have d real parts and d 
imaginary parts of the elements of |0), but one constraint and one parameter 
which can be fixed arbitrarily). 

For a pure state, = p so trace(p^) = 1. Another way to write the fidelity 
in this case is as trace(pp) = '^iji^{pij)^{pij) + ^{Pij)'^{Pij))- So if we take 
'ip{6) to be the vector of length 2d‘^ and of length 1 containing the real and 
the imaginary parts of elements of p we see that 1 —Fid(p, p) = |||'0 —V’lP- If 
follows that 1— fidelity is a quadratic loss function in 'ipiO) with again G = 1. 

Dehne again the Helstrom quantum information matrix H{9) for 9 by 
1 — Fid(p, p) ^ \{9 — 9 YIm{9){9 — 9). Just as in the previous two examples 
we expect the asymptotic lower bound to hold for N times Bayes 


mean fidelity-loss, where G = jH = Gip'. 

Some striking facts are k nown about estimation of a pure state. First of 
all, from iMatsnmotol (120021 ). we know that the Holevo bound is at tainable, 
for all G, already at = 1. Secondly, from Gill and Massar f 2000l) we have 
the following inequality 


traceiJ ^Im < d — 1 (18) 

with equality (in the case that the state is completely unknown) for all ex¬ 
haustive measurements on N copies of the state. Exhaustivity means, 

for a measurement with discrete outcome space, that M({^}) is a rank one 
matrix for each outcome x. The meaning of exhaustivity in general is by the 
same property for the density m{x) of the matrix-valued measure with 
respect to a real dominating measure, e.g., trace(M*^^i(-)). This tells us that 
(lisp is one of the “dual Holevo inequalities”. We can associate it with an orig¬ 
inal Holevo inequality once we know an information matrix of a measurement 
attaining the bound. We will show that there is an information matrix of the 
form Im = cH attaining the bound. Since the number of parameters (and 
dimension of H) is 2(d — 1) it follows by imposing equality in flTSP that c = |. 

The corresponding Holevo inequality must be trace^HH~^^HI > d — 1 
which tells us that Ci= d — 1. 

The proof uses an invariance property of the model. For any unitary 
matrix U (i.e., UU* = U*U = 1) we can convert the pure state p into a new 
pure state UpU*. The unitary matrices form a group under multiplication. 
Consequently the group can be thought to act on the parameter 9 used 
to describe the pure state. Clearly the fidelity between two states (or the 
fidelity between their two parameters) is invariant when the same unitary 
acts on both states. This group action possesses the “homogenous two point 
property”: for any two pairs of states such that the fidelities between the 
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members of each pair are the same, there is a unitary transforming the hrst 
pair into the second pair. 

We illustrate this in the case d = 2 where (first example, section 2), the 
pure states can be represented by the surface of the unit ball in It turns 
out that the action of the unitaries on the density matrices translates into the 
action of the group of orthogonal rotations on the unit sphere. Two points at 
equal distance on the sphere can be transformed by some rotation into any 
other two points at the same distance from one another; a constant distance 
between points on the sphere corresponds to a constant hdelity between the 
nnderlying states. 

In general, the pure states of dimension d can be identihed with the 
Riemannian manifold CP'^~^ whose natural Riemannian metric corresponds 
locally to hdelity (locally, 1— hdelity is sqnared Riemannian distance) and 
whose isometries correspond to the unitaries. This space posseses the ho¬ 
mogenous two point property, as we argued above. It is easy to show that 
the only Riemannian metrics invariant under isometries on such a space are 
proportional to one another. Hence the quadratic forms generating those 
metrics with respect to a particular parametrization must also be propor¬ 
tional to one another. 

Consider a measurement whose ontcome is actnally an estimate of the 
state, and suppose that this measurement is covariant nnder the nnitaries. 
This means that transforming the state by a nnitary, doing the measurement 
on the transformed state, and transforming the estimate back by the inverse 
of the same unitary, is the same (has the same POVM) as the original mea- 
snrement. The information matrix for such a measurement is generated from 
the squared Hellinger affinity between the distributions of the measurement 
ontcomes under two nearby states, just as the Helstrom information matrix 
is generated from the fidelity between the states. If the measurement is co¬ 
variant then the Riemannian metric dehned by the information matrix of 
the measnrement outcome must be invariant under nnitary transformations 
of the states. Hence: the information matrix of any covariant measurement 
is proportional to the Helstrom information matrix. 

Exhaustive covariant measnrements certainly do exist. A particularly 
simple one is that, for each of the N copies of the qnantnm system, we 
independently and nniformly choose a basis of and perform the simple 
measnrement (given in an example in Section 2) corresponding to that basis. 

The hrst conclusion of all this is: any exhaustive covariant measnrement 

—(N) 

has information matrix eqnal to one half the Helstrom information ma¬ 
trix. All such measurements attain the Holevo bonnd trace|i?(/^^)“^ > 
d — 1. In particular, this holds for the i.i.d. measurement based on repeat- 
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edly choosing a uniformly distributed random basis of C'^. 

The second conclusion is that an asymptotic lower bound on N times 
1— mean hdelity is d — 1. Now the exactly Bayes optimal measurement- 
and-estimation strategy is known to achieve this bound. The measurement 
involved is a mathematically elegant collective measurement on the N copies 
together, but hard to realise in the laboratory. Our results show that one can 
expect to asymptotically attain the bound by decent information processing 
(maximum likelihood? optimal Bayes with uniform prior and hdelity loss?) 
following an arbitrary exhaustive covariant measurement, of which the most 
simple to implement is the standard tomography measurement consisting 
of an independent random choice of measurement basis for each separate 
systenn _ 

In Gill and Massar f 2000l) the same bound as ms was shown to hold 
for separable (and in particular, for adaptive sequential) measurements also 
in the mixed state case. Moreover in the case d = 2, any information ma- 
trix satisfying the b ound is attainable already at = 1. This is used in 


Bagan et al.l (120061)1 1 to obtain sharp asymptotic bounds to mean hdelity for 


separable measurements on mixed qubits. 
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Appendix: proof of convexity 


The first step is to show that 

V = clos{i/ : V > Z(X) for some X} (19) 


where, as before, X = (Xi,... ,Xp), the X* are d x d self-adjoint matrices 
satisfying d/ddi trace(p(d)Xj) = 5^; Z is the p x p self-adjoint matrix with 
elements trace(p(6')XiXj); and V is a real symmetric matrix. 

An easy computation shows that Z(pX + (l—p)V) < pZ(X)-h(l—p)Z(V) 
(check that the second derivative w.r.t. p of {^jJ\Z{pX + (1 —p)Y)\'ip) is non¬ 
negative, for any complex vector 'ip.) This makes {V : V > Z{X) for some X}, 
where V is self-adjoint, a convex set. Restricting to the real matrices in this 
set preserves convexity, as does taking the closure of the set. By convexity, 
the definition ([Tj) tells us that the equations trace(GR) = Gq define support¬ 
ing hyperplanes to the set defined on the right hand side of flT^ . Since a 
closed convex set is the intersection of the closed halfspaces defined by its 
supporting hyperplanes, it follows that V as defined by ([9]) can also be spec¬ 
ified as dini), and that all the Holevo bounds trace(GR) > Gq are attained 
in V. 

The convexity of 3, the set of inverses of elements of V, is a lot more 
subtle. In the following argument I will suppose that the state p{9) is strictly 
positive. The proof is easily adapted to the case of a model for a pure state. 
(More generally we need the noti on of D - invari ant model and the spaces 
dehii ed by a quantum state, see iHolevol . Il982l or iHavashi and Matsumotol . 
20041 . 


We can consider our model with p parameters and a strictly positive den¬ 
sity matrix as a submodel of the model of a completely unknown mixed state, 
which has — 1 parameters. Denote the parameter vector of the full model 
by <p. The submodel is parametrized by 9, a subvector of <p. I’ll use the 
terminology interest parameter, nuisance parameter for the two subvectors 
of 0 corresponding to submodel parameters and auxiliary parameters. Sub¬ 
scripts 1, 2 will be also used when we partition matrices or vectors according 
to these two parts. By the strict positivity of p we are working at a point 
in the interior of the full model (this is one of the reasons why the argument 
needs to be adapted for a pure-state model). Since tracep = 1, the partial 
derivatives of p with respect to the components of 9 in submodel and 0 in 
fullmodel are traceless (i.e., have trace zero). It is easy to see from this that 
we may restrict the elements X of X, entering into the Holevo bounds for 
the submodel, and elements R of R, entering into the Holevo bounds for the 
full model, to be such that trace pR = 0. Such R form a — 1 dimensional 
real Hilbert space Dq(p) under the innerproduct (X, R)p = 3ft trace pXR. 
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Let p\ denote the partial derivative of p with respect to 6i at the hxed pa¬ 
rameter value under consideration. For the submodel, dehne the symmetric 
logarithmic derivatives A* G Lg(p) by {\i,X)p = tracep'X for all X G Lq(p). 
The constraints trace= Sij translate into constraints {\i,Xj)p = 5ij 
for all i,j < p. In the full model, I’ll use the notation fl for the vector of 
symmetric logarithmic derivatives, and Y for a candidate vector of Fi, each 
of length (P — 1. Of course, A is a subvector of pi. In the full model, the 
constraints on Y translate into {p,i,Yj)p = Sij for all i,j < (P — 1. The pi 
form a basis of Lq(p) of linearly independent vectors. 

Now in the full model, the constraints on the T* make them uniquely 
dehned. Thus for the full model, the set Vfuu is the set of all {P — 1) x {P — 
1) real matrices W exceeding the fixed self-adjoint matrix Zf^n = Z(Y). 
Unfortunately, Zmi is singular. But we may describe Jfuii as the closure of 
the set of all real matrices less than or equal to (Zf^u Y for some A > 0. 
The convexity of both sets is trivial. This suggests that we try to deal with 
the case of a p parameter model by considering it a submodel of the full P — 1 
parameter model. 

The relation between inverse information matrices for full models and 
submodels is complicated, but that between the information matrices them¬ 
selves is simple: the information matrix for a submodel is a submatrix of 
the information matrix of a full model. Thus we might conjecture that for 
every / G J, there exists a IF > Zmi such that / < (IU“^)ii, the subscript 
“11” indicating the submodel submatrix. However, it could be that we have 
positive information for the submodel parameters, but zero information for 
the auxiliary parameters. This would make the corresponding inverse in¬ 
formation matrix W~^ for the full model undehned. This problem can be 
solved by approximating singular information matrices by nonsingular ones. 
We will prove the following theorem; 

Theorem 3. V~^ ^ if and only if there exist real matrices > Zm\, 
with ((IU(”))-i)ii = ^ U-i asn^oo. 

In words, I is the closure of the set of 11 submatrices of real symmetric 
non-nonsingular matrices less than or equal to {Zm\ + for some A > 0. 
Consequently J is convex. 

Proof. The proof will work by frequent reparametrizations of the nuisance 
part of the full model. By this we mean that 0 is transformed smoothly 
and one-to-one into, say, ip, in such a way that the interest component of 
0 is unaltered. Under such a transformation, the vector of symmetric loga¬ 
rithmic derivatives fl transforms by premultiplication by an invertible matrix 
C whose 11 block is the identity and whose 12 block is zero, so the ‘inter¬ 
est” part of pi is unchanged. (Subject to C being nonsingular, for which it 
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is just necessary that the 22 block is nonsingular, the 21 block of C can be 
arbitrary). At the same time the vector of operators Y transforms by premul- 
tiplication by the transposed inverse of C. Conseqnently, Zf^n is transformed 
into W > Zfuii is transformed the same way, while W~^ is 

transformed into CW~^C"^. We therefore see that the 11 block (i.e., the 
submatrix corresponding to the snbmodel) of W~^ remains invariant under 
reparametrization of the auxiliary or nuisanee parameters. 

In the statement of the theorem the choice of parametrization of the 
auxiliary parameters is arbitrary, and so can be chosen in any convenient 
way. We take advantage of this possibility immediately, in the proof of the 
the forwards implication of the theorem. 

Snppose V > Z(X) for some X satisfying the usnal constraints. Angment 
A to a vector fl of — 1 linearly independent elements /r* G llo(p) ^^at 
{fii,Xj)p = 5ij for all i < d? — 1, j < p. (The extra elements can be an 
arbitrary basis of the orthocomplement of the Xj, it is easy to check that 
together with the old elements they are linearly independent, hence because 
of their number, a basis). Next augment X to Y, so that the the orthogonality 
relation, with Xj replaced by h}, also holds for p < j < df — 1. 

For sqnare matrices A, B write diag(A, B) for the block diagonal matrix 
with A and B as diagonal blocks corresponding to interest and nnisance parts 
of the fnll model. Let = diag(l,el), this is the diagonal matrix with I’s 
on the interest parameter part of the diagonal, e’s on the nnisance part. 

We have D^Zia\\B>e —^ diag(Z(X), 0 ) < diag(V, 0 ) as e — 0. Therefore, for 
each e > 0 we can hnd 5 = (5(e) > 0 such that < diag(V, 0 ) + (51 and 

moreover such that (5 —0 as e —0. Thus for each e, Zmi < i(diag(V^, 0 ) + 
51)D:^ = W, where ((IF,)-i)ii ^ as e ^ 0. 

Choosing a seqnence e„ —> 0 as n —> cx) we have fonnd > Zf^n 

for all n with ^ V ^ as n —> cx). Going back to the original 

parametrization does not alter ((hF*''^^)“^)ii so the forwards implication of 
the theorem is proved. 

Now for the backwards implication. Snppose I am given W > Zmi, 

(hF“^)ii = V~^. Reparametrize the nnisance part of the full model so that 

(hF“^)i2 = 0 . This does not alter (IF“^)ii bnt does alter both interest 
and nnisance parts of Y. Denote the interest part of the transformed Y by 
X. The inequality W > Zmi remains true after the transformation, hence 
hFii > Z(X). Since W is block diagonal, we obtain from this (IF“^)ii < 
(Z(X) + (51)“^ for some (5 > 0. Taking the closnre completes the proof. □ 
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